A Review on Text Sanitization

نویسندگان

  • Veena Vasudevan
  • Ansamma John
  • M. M. Douglass
  • G. D. Cliffford
  • A. Reisner
  • W. J. Long
  • G. B. Moody
  • V. T. Chakaravarthy
  • H. Gupta
  • P. Roy
چکیده

Information is essential for all purpose of activities such as research, business decision making, etc. In this internet technology age there is no scarcity of information also. But if the information reveals the identity of a person or if it discloses confidential matters, then such information is a serious threat to privacy. So before publishing or sharing documents, the sensitive information should be removed or masked. This is the major goal of Text sanitization. Several semi-automatic and automatic methods are used for identifying sensitive information and thereby sanitizing the document by removing such terms. This broadens the users using the document due to their lowered classification level and also privacy is preserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

t-Plausibility: Generalizing Words to Desensitize Text

De-identified data has the potential to be shared widely to support decision making and research. While significant advances have been made in anonymization of structured data, anonymization of textual information is in it infancy. Document sanitization requires finding and removing personally identifiable information. While current tools are effective at removing specific types of information ...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

With the advent of internet, large numbers of text documents are published and shared every day . Each of these documents is a collection of vast amount of information. Publically sharing of some of this information may affect the privacy of the document, if they are confidential information. So before document publishing, sanitization operations are performed on the document for preserving the...

متن کامل

Detecting Sensitive Information from Textual Documents: An Information-Theoretic Approach

Whenever a document containing sensitive information needs to be made public, privacy-preserving measures should be implemented. Document sanitization aims at detecting sensitive pieces of information in text, which are removed or hidden prior publication. Even though methods detecting sensitive structured information like e-mails, dates or social security numbers, or domain specific data like ...

متن کامل

Document Sanitization: Measuring Search Engine Information Loss and Risk of Disclosure for the Wikileaks cables

In this paper we evaluate the effect of a document sanitization process on a set of information retrieval metrics, in order to measure information loss and risk of disclosure. As an example document set, we use a subset of the Wikileaks Cables, made up of documents relating to five key news items which were revealed by the cables. In order to sanitize the documents we have developed a semi-auto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014